Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schema_registry_decode: support avro for decoding #3137

Merged
merged 3 commits into from
Jan 23, 2025

Conversation

rockwotj
Copy link
Collaborator

  • update changelog
  • schema_registry_decode: use hamba/avro as an alternative impl

I went back and forth on patching goavro, but this just seems simpler
and is much faster too. The tree/schema walking is a little complex but
here we are. Hopefully we can eventually just remove the code for goavro
at some point, but we have other code we'd need to switch around.
Copy link
Collaborator

@mihaitodor mihaitodor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one Tyler! 😎 I left a few nits, feel free to 🐑 🚀

CHANGELOG.md Outdated Show resolved Hide resolved
name: "successful message with logical types",
input: "\x00\x00\x00\x00\x04\x02\x90\xaf\xce!\x02\x80\x80揪\x97\t\x02\x80\x80\xde\xf2\xdf\xff\xdf\xdc\x01\x02\x02!",
output: `{"int_time_millis":35245000,"long_time_micros":20192000000000,"long_timestamp_micros":62135596800000000,"pos_0_33333333":"!"}`,
hambaOutput: `{"int_time_millis":"9h47m25s","long_time_micros":"5608h53m20s","long_timestamp_micros":"3939-01-01T00:00:00Z","pos_0_33333333":"0.33"}`,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably smth that might need to be revised in the future, but I'm wondering if timestamps should get some custom serialisation logic when using Hamba (guess it should also represent micros as a number).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having them as bloblang time types is better right? One could always walk the message and transform time types into unix micros right?

Happy to revisit before we ship a release.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure what users expect in the serialised JSON, but in Bloblang we now have .timestamp(), so it shouldn't be an issue to go from a number to a timestamp. I don't have any strong preferences though

internal/impl/confluent/serde_hamba_avro.go Show resolved Hide resolved
@rockwotj
Copy link
Collaborator Author

rockwotj commented Jan 23, 2025

Thanks for the speedy review here Mihai! If you have other thoughts here I'd love to hear them (happy to create followup PRs), and would also like to get your thoughts on the best way to also work with all the extensions that Kafka Connect has.

@rockwotj rockwotj merged commit c36dcb6 into redpanda-data:main Jan 23, 2025
4 checks passed
@mihaitodor
Copy link
Collaborator

You're welcome! I think the decimal conversion should be OK now, so that covers what was needed here for goavro. Not sure if any new ones were introduced since then. I can have a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants